Minimum variance distortionless response on a warped frequency scale

نویسندگان

  • Matthias Wölfel
  • John W. McDonough
  • Alexander H. Waibel
چکیده

In this work we propose a time domain technique to estimate an all-pole model based on the minimum variance distortionless response (MVDR) using a warped short time frequency axis such as the Mel scale. The use of the MVDR eliminates the overemphasis of harmonic peaks typically seen in medium and high pitched voiced speech when spectral estimation is based on linear prediction (LP). Moreover, warping the frequency axis prior to MVDR spectral estimation ensures more parameters in the spectral model are allocated to the low, as opposed to high, frequency regions of the spectrum, thereby mimicking the human auditory system. In a series of speech recognition experiments on the Switchboard Corpus (spontaneous English telephone speech), the proposed approach achieved a word error rate (WER) of 32.1% for female speakers, which is clearly superior to the 33.2% WER obtained by the usual combination of Mel warping and linear prediction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Frequency warping and robust speaker verification: a comparison of alternative mel-scale representations

Accuracy of speaker verification is high under controlled conditions but falls off rapidly in the presence of interfering sounds. This is because spectral features, such as Mel-frequency cepstral coefficients (MFCCs), are sensitive to additive noise. MFCCs are a particular realization of warped-frequency representation with low-frequency focus. But there are several alternative, potentially mor...

متن کامل

Warping and Scaling of the Minimum Variance Distortionless Response

Spectral estimation based on the minimum variance distortionless response (MVDR) is well-known in the signal processing literature and has been shown to be superior to linear prediction for robust speech recognition. In this work we propose two techniques to improve the resolution and the robustness of the MVDR spectral estimate: The first is a time-domain technique to estimate an all-pole mode...

متن کامل

Speaker dependent model order selection of spectral envelopes

This work introduces a maximum-likelihood based model order (MO) selection technique for spectral envelopes to apply speaker dependent adaptation in the feature-space similar to vocal tract length normalization. Speech recognition systems based on spectral envelopes are using a fixed MO for the underlying linear parametric model. Using a fixed MO over different speakers or channels might not be...

متن کامل

Frame based model order selection of spectral envelopes

Spectral envelopes, using (warped or perceptual) linear prediction or minimum variance distortionless response for the underlying linear parametric model, are widely used in speech recognition systems where the frequency resolution, namely the model order (MO), of the spectrum is kept constant. Modeling different types of phonemes such as vowels or fricatives with the same frequency resolution ...

متن کامل

Speaker identification using warped MVDR cepstral features

It is common practice to use similar or even the same feature extraction methods for automatic speech recognition and speaker identification. While the front-end for the former requires to preserve phoneme discrimination and to compensate for speaker differences to some extend, the front-end for the latter has to preserve the unique characteristics of individual speakers. It seems, therefore, c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003